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Abstract 

In a r ecent seninal paper , G bs on and Wxl er ([1], GV^i t ake i npor t ant steps to for rral i zi ng t he not i on of 
1 anguage 1 ear ni ng i n a (fini t e) s pace whos e gr amrar s ar e char acterizedbya Rni t e nunber of parameters. 
Che of t hei r ai ns is to char act er i ze t he conpl exi t y of 1 ear ni ng i n s uch s paces . For exanpl e , t hey demon- 
strate that even i n fini te spaces , convergence rray be a probl emsi nee it is possi bl e under some single-stc 
gradient ascent methods to remain at a local maxi mum Fromthe standpoint of learning theory, hove 
ever , GW1 eave open several quest i ons that can be addressed by a more preci se formal i z at i on i n t er ms of 
Mir kov structures (a pos s i bl e for nal i z at i on s ugges t ed but left unpur s ued in a foot not e of GMf. In t hi s 
paper we expl i ci 11 y for nal i ze 1 ear ni ng in a fini t e par amet er s pace as a Mir kov s t r uct ur e whos e s t at es ar 
parameter settings. Several important results that fol 1 owdi recti y fromthi s characterization, include 
cor r ect ed ver s i on of GWs cent r al conver gence pr oof ; (2) an expl i ci t for mil a f or cal cul at i ng t he t r ans i t 
pr obabi lities between hypot hes es and t he exi s t ence of “pr obi emstates” in addi t i on to 1 ocal maxi ma; (3 
an explicit cal cul at i on of t he t i me nee ded t o conver ge, i n t er ns of nunber of (positive) exanples; (■ 
t he conver gence and conpar i s on of s e ver al var i ant s of t he GW1 ear ni ng pr ocedur e , e . g. , r andomwal k; (5) 
bat ch- and PAG s t yl e 1 ear ni ng bounds f or t he model . 


Copyright © Massachusetts Institute of Technology, 1993 


s report describes research done within the (inter for Biological and Computational Learning in the Departmnt of Rain 
(Ognitive Sciences, and at the Artificial Intelligence laboratory. This researchis supported by NSF grant 9217041-AC 
AtPAunder the HKCprogram Correspondence bye-rail could be directed to pn@ai. imt. edu or berwcklQti. imt. edu. 


Introduction: The Triggering Model 
as a Mrkov structure 


exanpl e s e nt e nc e,-sat time ^(examples drawn 
from the language of a single target gramrar, 
from a uni f or m di s t r i but i on on the 1 an- 
Recentl y, Q bson and Wxl er ( [ 1] , GV\f have begun to guage (we shal 1 be abl e to rel ax t hi s di stri but i onal 
formalize the notion of language learning in a (fini t efens t r ai nt later on); 
space whose gramrars (and languages) are character¬ 
ized by a finite nunber of parameters or 1-di rams i onal* [ Learnabi 1 1 ty on error detection] Step 3 If the cur- 
Boole an-valued arrays, nlong. A gramrar i n t hi s s pace r ent gr annar par s es, ( gener at ) hen go t o St ep 
is si nply a particular n- 1 engt h ar r ay of O’s and 1 ’ s ; hei&e ot her ™ s e ’ contlnue - 

t her e ar e n 2pos s i bl e gr aimar s (1 anguages ). Che of Q b- • [ Si ngl e - s t ep gr adi ent - as cent ] Sel ect a s i ngl e par am 
s on and Wxl er’s aim is to est abl i s h t hat under s one e t er at r andonq uni f or ni y wi t h pr obabi 1 i t y 1 /n, 
si npl e hi 11 - cl i nbi ng 1 earni ng regi ms , namely, si ngl e-sttrli p f romi ts current setting, and change it (0 
gradi ent ascent, some 1 i ngui sti cal 1 y natural , fini t e , spfpggp e d to 1, 1 to 0) iff that change al l ows the cur- 
cue uni ear nabl e , in the sense that pos i t i ve-onl y exanpl ?% n t sentence to be analyzed ; otherwise go to Step 
lead to local rruxtrru —incorrect hypotheses fromwhich 2; 
a 1 earner can never escape. Mere broadl y, they wi sh to 

show that learnability in such spaces is still an iOtefourse, this al gor i t hm never halts i n t he usual 
esting problem} in that there is a substantive le&ffilfig GWaimto show under what conditions this al - 
t he or y c oncer ni ng f eas i bi 1 i t y, convergence ti me, itW m conver g es “in the limit”—Chat is, after some 
like, that mist be addressed beyond tradi ti onal liftpfr'gE, n, of steps, where n is unknown, the correct 
tic t he or y and t hat ni ght even choos e be t ween ot her P ar an *t er s e 11 i ngs will be selected and never be 

ade quat e 1 i ngui stictheories. c hanged. The i r c e nt r al c 1 ai mi s s t at e d as t he i r The or e m 

I n t hi s paper , we choos e as a conveni ent s t ar t i ng JotiM 7 i n t hei r nanus cr hpt) . 

t hei r Tr i gger i ng Le ar ni ng A gor i t hm( TLA) to focus .opr , , , , , . , . , , 

, • '. r i ■ . , I neorem 1 As l ong as t he probabi l ity is al ways qreat er 

l nves 1 1 gat l on ot parameter learning. Uur central r,e surf, , , ;,, _ , , , , , , , , , 

, , ° , o ,- . i • i • . i • ifhan a l ower bound b b > (J t hat t he l earner mil 1) en- 

is that the performance ot this aigorithmis completely , , . ' . 7 . , 7 

,,,, urn , • m • , p., count er a l ocal trigger for som i ncorrect l y- set par amt er 

model ed by a ivar kov chai n. ihe r ernai nder ot t he cur^, , ., , , , ■ , , , , , , , 

, F. and 2 t hen reset F accordingly to the tarqet value, it 

rent paper is devotedtoexploringthe basic consequences 7 , , , , , , , , , , , 

of t hi s f act z urns out that the t arget gramrar can al ways be l earned 

T . r- ) • ,i r %\T j i iii rw a i? i using the Triggering Learning At qorithm 

Let us hrst review the GWmodei and t he iLA Lot - 7 7 7 7 

1 owi ng Gbl d [ 2] t he bas i c f r amewor k i s t hat of i depth fi- m , „ , r . 

, . . lid -, m i /i ii\ , , 1.1 Ihe Markov tornulation 

cation m the limt. ihe 1 e ar ne r ( c hi 1 d) starts out l n an 

arbitrary state= some setting of the n par amet er fedrnt he standpoint of learning theory, however, GW 
Ihe learner (child) receives a (countably i ilfMtee )open several questions that can be addressed by 


sequence of pos i t i ve exanpl e s ent ences dr awn f r oms armor e pr eci s e for nal i z at i on of t hi s model in terns of 
t arget 1 anguage 4 . lAf t er each present at i on, t he 1 earMffrkov chai ns ( a pos si bl e f or nal i zat i on suggest ed but 
caneither (i) stayinthes ant s t at e ; or (i i ) move t d afriewmpur s ued in foot not e 9 of GV)f. W can pi ct ur e 
hypot hesi s s t at e , usi ng t he al gor i t hmgi ven bel ow. t hf hfpot hesi s space , of s?i, zas2a set of poi nt s , each 
ter some fini te number of exanpl es the 1 earner convercgusr es pondi ng t o one parti cul ar array of parameter set- 
to the correct target language (=parameter s e 11 it hg!g)s (languages, grammars). Chi 1 eachpoint a hypothe- 
and never changes s t at e , t hen i t has cor r ect 1 y i de mk isfMM e or si npl y state of t hi s space . A i s convent i onal , 
the target 1 anguage; otherwi se, it does not converge .define these 1 anguages over some al phabet E as a sub- 
In addition, intheGWmodel the 1 anguage 1 ear iftftt of E Che of t he mi s t he t ar get 1 anguage (gr annar ) . 
obeys t wo f undame nt al constraints: (1) the si ngl e-AM ®<r bi t r ar i 1 y pi ace t he (single) target gramrar at the 
constraint —the learner can change only 1 par amet eent er of this space. Since by t he ILAthe learner is re¬ 
value at a time; and (2) the greedi ness constrai Mf^ftfr,i cted to movi ng at most 1 bi nary val ue i n a si ngl e step, 
t he 1 ear ner is gi ven a pos i t i ve exanpl e i t cannot f heotgheor et i cal 1 y pos s i bl e t r ans itions betweenstates ca 
ni ze (accept), and if t he 1 ear ner changes one par afaetcterawn as ( di r ect ed) 1 i nes connect i ng par amet er ar r ays 
value and finds that it can accept the example, t hen (tftjjepot hes es) that differ by at most 1 binary digit (a 0 
learner retains that new parameter value. Finally^ ®el i n s ome cor r es pondi ng pos i t i on i n t hei r arrays), 
al s o r ecal 1 GW s defini t i on of a l ocal t ri gger ( ni nor fetal-1 t hat t hi s is t he s o- cal 1 ed Harm ng di st ance. 
t i onal changes as i de) : gi ven val ues for al 1 par amet er Wbirhy f ur t her pi ace wei ghts on t he t r ans i t i ons f r om 
one , a l ocal tri gger for val ue v of par an$t§(® )p i s state i to state ) cor r es pondi ng to t he nonzer o b ’ s men- 
asentence sfromthe target gr anmaf sCfich t hat s is tioned in the theoremabove; these correspond to the 
gr amrat i cal i ff)(w) =v . GWt hen s t at e t hei r TLA as pr obabi 1 i t i es t hat t he 1 ear ner wi 11 move f r om hypot he- 
f ol 1 ows : sis state i to state ). In fact, as we s hal 1 s how bel ow, 


r , , p, 1 0 , , , , • given a di s t r i but i on over L( G) , we can f ur t her car r y out 

Initialize step 1. start at s ome r andompomt in. . , . Pjl ,,,, ,, , m 

L J A A Inc. eol /'ll! oti nn at fine, q e t n q I r\ J c t np -p'® zx I mo c Inno ts; 


, , , c •, x o • , , , Af-lie* cal cul at i on of the actual 

the (hmte) space ot possible parameter settings, 


ns elves. 


Thus . 


s pe c i f yi ng a s i ngl e hypot he s i z e d gr aimar wi t hits ' ~ . 

, , . , , Jxite that the notion ot trigger 

r es ul 1 1 ng ext ens l on as a 1 anguage; , , , , ,, „» . , ,, „» i 

° ° ° statemnt ot tlie JIAor the constraints the ILAerptoys, 


does not enter i nto the 


[Process input sentence] Step 2. Receive a pjoslktt raty into the statemnt of the theorem 



can picture the TLA 1 earni ng space as a di rected,Thear-ef ore Cis not learnable, a cont r adi ct i on. I n t he 
be 1 e d gr aph V wi t if 2ve r t i c & M)r e precisely, we can second case, without loss of generality, ass urn t he r e are 
make t he f ol 1 owi ng r errar ks about t he TLAs ys t emGWexact 1 y t wo abs or bi ng s t at es , t he Hr s t S' cor r es pondi ng 
describe. t o t he t ar get par am t er s et t i ng, and t he s'econrieS 1 

Rermrk. The TLA s ys t e mi s nermryless, t hat i s , gi vefiP onc ^ n § to son ® other setti ng. By t he defini t i on of an 
a sequence s of sentences up to tpmtlfe sel ecti on a ^ sor ^| n 8 state, in the limit Cwill with some nonzero 

p bTTTrntb(xeie h rG n d e nnl if nn <x.n r .p rid nnt probability e nt dp 5lld ne Ve r e XI t . S Tie n C 1 S not 

learnable, a contradiction. Hence our assumption that 
t her e is not exact 1 y 1 AS mis t be f al s e . 

=y. As sum that there exists exactly 1 AS i i n t he 
Mir kov chai nM Tien, by t he defini t i on of an abs or bi ng 
s,t at e, after som nunber of steps n, no matter what the 


of hypothesis h depends only on s enteric and not 
( di r ect 1 y) on pr e vi ous s ent ences , i . e . , 

p{h(y) <Xi\x(t), t }=P{x(ti) <Xi\x(^i)} 


In other words, the TLA system is a classical 
cret e st ochast i c process , i n par t i cul ar , a discrete 


; ar,t l ng s t at e 

x xx K xxx u x x, xxx xxx , „^iuwv.v ..HrfOtl, 

,, , , • , , , , to the target gr animr. 

process or ivarkov chain. W can nowuse the theory of , ... , . 

L V \Uv + rX F U .X F I L, n XT XX XX XX 


Mm 11 end up i n s t at e i , cor r es pondi ng 


tx i , • , , • , rrr a x r oi -n Not e t hat t hi s appr oach avoi ds a cr uci al flaw l n t he 

Mirkovchams todescribe ILAparamter spaces j . Lor r _ „ . . 

, • , , , , ■ , , , pr opt gi ve n l n GW pp. 7- 8 l n nanus c r l pt : 

example, as is well known, we can convert the graphfcal ° v ' 

r epr es ent at i on of an n- di mns i onal Mir kov chai n M t o That is, if t he 1 ear ner never goes t hr ough 

an n x n mat r i x T, whe re each mat r i x e nt r y (i , j ) rep- t he s am state twice, t he ns he is bound t o e nd 

resents the tr ans i t i on pr obabi lityfromstatei tost afiei nthe target state at som point, bee aus e 

j . A si ngl e s t ep of t he Mir kov process is comput ed vi a t he par amt er space i s fini t e i n si ze . Thus t he 

t he mat r i x mil t i pi i cat i on T xT ; n s t eps i s gi "den by T pr obabi 1 i t y of avoi di ng t he t ar get s t at e f or- 

A “1” ent r y i n any cell (i , j ) mans t hat t he s ys t emwi 1 lever is equi val ent t o t he pr obabi 1 i t y of cycl i ng 

c onve rgewithpr obabi lityltost ate j, givent hat it stafbsever t hr ough som or de red set of states (a 

in s t at e i . cycle). 

As mnti oned, not al 1 these transi ti ons wi 11 be pos- W can di vi de the paramter space i nt o a 
si bl e in general . For exampl e, by the si ngl e val ue hyfini te set of ni ni mal cycl es , where each ni n- 
pothesis, the systemcan only rove 1 Hamming bit at i nnl cycle contains no cycles as a subpart, 
atim. A so, by as s umpt i on, onl y di ffer ences i n s urf actBecaus e t he par amt er s pace i s fini t e , t he s et 
strings can force the learner fromone hypothesis state bf> minimal cycles in the paramter space is 
another. For instance, if state i cor r es ponds t o a gr anal s o fini t e . For each ni ni nal cycle, we can 
mar that generates a 1 anguage t hat is a proper subset nowcal cul ate the probabi 1 i ty that the 1 earner 
of anot her gr amrar hypot he s i s j , t he r e c an ne ve r be a re nai ns i n t hat cycle forever.. . the pr obabi 1 - 
t r ans ition (nonzero b) fromj to i , and t he r e mis t bei t y of s t ayi ng i n t he [ ni ni nal pn/r c b] cycle 
one fromi to j . Further , by ass umpt i on and the TLA, inthelinit (forever) is zero. Tie s am is true 
it is cl ear t hat once we r each t he t ar get gr amrar t her ef or al 1 of t he fini t el y- many ni ni mal cycl es , so 
is nothing that can neve the learner fromthis s t at e ,t hat t he pr obabi 1 i t y of s t ayi ng i n any of t hes e 
si nee al 1 renai ni ng posi ti ve evi dence wi 11 not cause thycl es i n t he 1 i ni t is al so zero. Thus the prob- 
1 ear ner to change i t s hypot he si s . Thus , t her e mist be a ability of endi ng up at the target state in the 
1 oop f r omt hetarget state to itself, withs om pos i t i Mm t i s one. 

1 abel 'b and no exi t ar cs . I n t he Mir kov chai n 1 i t er at ur T e , , . „ ; ; ; ; . , , , , , . . . . . ; . 

, , • • , A , , ■ ru j / aco i 1 ri br i et , GWat t empt t o s howt hat t he pr obabi 1 1 1 y ot 

t hi s is known as an Absorbing State Ab . Ubvi ous 1 y, ,a . ’ . . . . , 

. . . i . i i j . AC • i i i j • n , the learner avoi di ng t he target forever is zero by s ho wi ng 

s t at e t hat onf y f eads t o an Ab wi f f af s o dr l ve t he f ear ner, . , , , , . 9 . . . t. . J ° 

xxixaott-ii • that the fact that som m ni mat eyefe occurs inhnitefy 

tot hat Ab. Li naf f y, if a s t at e cor r es ponds t o a gram . ; . . . . . . ; . ; . . „ . ; J 

,,, , , j-xii ..often makes t he pr obabi f 1 1 y of t he l nhm te sequence zero, 

mar that generates som sentences of the target tfiere.. . r ^ . . 

, , r .. . • . ,p .ixi In other words every way l n whi chthe fear ner avoi ds 

l s af ways a f oop t r omany statetoitseft, t hat has, s om . . " . ." _ . . . . . . . 

, , • , • , i ii , t he t ar ge t has probabi f l ty zero, ihus t he y c one f ude t hat 

nonzero probabi f 1 1 y. (Jearfy, one canconcfude at once. ° . , r l J . , J 

, , o i i • i i • i • x ix pr obabi f 1 1 y of t he e vent 

the f of f owi ng f earnabi f l ty resuf t: 1 J 

Tieorem2 Given a Mrkov chain C corresponding to Event =Learner avoids target forever 

a GW TLA l earner, 3 exactly 1 AS (corresponding to ■ , ,, , • 

, ’ :" A , 1 , , 3 is zero, nure precisely, they claim) 

the target gramrur/l anguage j iff C is learnable. 

Proof. x=. By as s umpt i on, Gis learnable. Nowassum Pr[\JW a \ —0 

for sake of cont r adi ct i on t hat there is not exact^pge^^ ^ j g a pat h avoi di ng t he t ar get and UIT 
AS. Then t her e mis t bee it her OASor >1AS. Inthq g ge ^- oP such pa t hs . However , as i s wel 1 known, t hi s 
first cas e , by t he defini t i on of an abs or bi ng s t at e , u ^ l b ] h e conp iit at i on i s t r ue i ff i t is t aken over a count abl e 
is no hypothesis m which the learner will r e mu n f eg of e i emnt s. I n t he exanpl e at hand, the crucial 

omission i n t he argumnt is that the there are an un¬ 
count abl e number of ways i n whi ch t he 1 ear ner can avoi d 
the target. Tii s is because there are an uncountable 
number of sequences of numbers between 1 and M— 1. 

The base M— 1 expansion of any real number in the 


2 GW construct an identical transi ti on diagrami nthe de¬ 
scription of their corprter programfor calculating local rax- 
ira. Ifcvpver, this diagramis not explicitly presented as a 
Mrhov structure; it does not include transi ti on probalilities. 
Of course, topologically both structures rust be identical. 



i nt er val [0, 1) woul d yi el d s uch a sequence ( e . g. , conSiiplpDS e SOV(s et t i ng ff 5=[ 0 10]) i s t he t ar get gr am 
an i r r at i onal expans i on s uch as t he s quar e r oot of ffirjr (1 anguage). Wt h t he GW3- par amt er s ys t enq 
Since there are an uncountable nunber of ways there are 3 2=8 possible hypotheses, so we can draw 
which the event of avoiding the target forever cathte as an 8-poi nt Mir kov conhgur at i on s pace, as shown 
r eal i zed, t he f act t hat each such way has pr obabi 1 i tiyizteh© hgur e above . The shaded r i ngs represent inereas- 
does not i npl y t hat t he t ot al event has pr obabi 1 i t yifflg-cHarmi ng di s t ances f r omt he t ar get. Each 1 abel ed 
as wel1 . To see t hi s consi der a r andomvar i abl e V wi t hi e is a Mir kov s t at e , a possi bl e ar r ay of par amt er 
a uni f or mdi s t r i but i on on [ 0, 1]. Nowcons i der t he pgs or gr arnmr , hence extensi onal 1 y speci Res a pos¬ 

sible target language. Each state is exactly 1 binary 
Event: X< 1/2 di gi t away f romi ts possi bl e transi ti on nei ghbors . Each 

rr, . , ,, directed arc between the poi nt s is a possi bl e f nonzero) 

ihe r e ar e rrany ways l n whi chthis event couldoccur e.g ... „ , , . , , , . 111 i i . 

,, , ,, ,, / , 0 “ „ „ „ , ; _ . P ; . t rdns ltion tromstate ! to state i ; w s hal 1 s ho w ho w t o 

V=l/4, V=l/3, X = 0. 234 etc. Each of t he s e ways , , , . . ,. , , , , J ’ 

, • n[ v n nr i / cpnput e t hi s l nm di at e 1 y below. W ass um t hat t he 

has pr obabi lity zero l . e. , F X = l/4 =0, PX = 1/3T— , , , , . , , , , 

, TT , L , i , , 11-3 target gr amrar, adouble circle, lies at the center, 

and s o on. However we know t hat t he pr obabi 1 1 1 y° ° , ’, , , n n , 

. , corresponds t o t he (English) SOV 1 anguage . Surround- 

I h 1 S IS DG C ci-US G 


0 . 


Thi: 


of the GVGnt X < 1/2 l s 1/2 not zero, _l±±± o ± o , n n n n , , , n 0 , n 

, i , 77 i p • i • i ^lne; thG bul 1s-eye t ar got ar g t he o ot JiGr par amet Gr ar r ays 

t Jig r g ar g an uncount able nunt)Gr ot ways 1 n whi ch t he, ® ^ r r ^ i rn i 1 • n . i • 

j -\r -i /a i i , i i m A r • that di tier t rom DID by onG bi nary di gi t Gach; to pi c- 

g vg nt X < 1 / 2 corn d t akG pi acG. Jims t hG pr oot asffiyGn M 1 J . , . , r , n ’ f 

r 11 • • i X i 7 r , Jure these as aring i JJarnn ns: bi t amy i r omt hG t ar get: 

in 1 is incorrect. Che correct way to i or mil at r c^ t ire in ° n . mor , , , . 

L r J . i , . , i • • i tv t i r i id, 1, 1 , c or r e s pondi ng t o GW s par ame ter setting #> 

prooi is by r es or 1 1 ng t o an expl l ci t Imr kov i or mil at l on . J / 0 /rt ~ n i n . 

, ii , , , i • r i i n lmtheir hgure 6 (bpec-hrst, Coup-hnal , +V2, basic 

as suggested but not executed in GWs i oot note 9, and mAAi f. ’ . 


mgge 

as to es t abl i s hed above . A s i ni 1 ar conce 
s e e ni ngl y 1 e ads t o t he i r f ai 1 ur e t o not e t hat t he r e 
ot her s t at es best des 1 ocal imxi im, for whi ch conver 
imy not occur . 


, n j. Vnder i SVO+V2); [ 0 0 01 , cor r es pondi ng t o GWs s et t i ng 
ptual diffiptto ec _ fi ' s t Cbnp- Hr s t, -\E) , basic order SOV; and 


Cbnp-first, —V2) , basic order 
^1^1, GWs set t i ng ffi ( Spec- Hnal , Cbrrp-Hnal , —02] 


b^Vc order VC6. 


0 


Around this inner ring lie 3 par amt er setting hy- 
Corol 1 ary 1 Oven a Mrkov chain corresponding to a potheses, all 2 binary digits away from the target: 

(fini t e) fam l y of grammrs i n a GWl earning syst erg if 0 1] , [1 0 0] , and [111] ( gr amrar s fft, 3, and 8 i n GW 
there exist 2 or rrore AS, then that fam ly is not l ear &gure 3) . Note that by the Si ngl e Val ue hypot hesi s that 
abl e. t he 1 ear ner can onl y rove one gr ey r i ng t owar ds or away 

f r omt he t ar get at any one step. Fi nal 1 y, one m>r e r i ng 
Exarrpl e. ou ^ , t hr ee bi nar y di gi t s di ffer ent f r omt he t ar get, is t he 

Consider the GW3-parameter system Its bi nar y Pgyp 0 q ^gg j s [10 1], cor r es pondi ng t o t ar get gr amrar 4. 
ramters are. (1) Spec(ifier) first (0) or last (lj,jt(?i easy to see f romi nspecti on of the figure that 
Cbrrp( lemnt) first (0) orlast (1), and Verb Second e ar e exact 1 y 2 abs or bi ng s t at es i n t hi s Mir kov chai n, 

does not exist(0)ordoesexist(l). Ely Specifier GWI^q- j s , s t at es t hat have no exi t ar cs . Che AS i s t he 
lowthe standard linguistic convention of whether graramr (by defini t i on). The other AS i s state 2. 

is part of a phrase that “specifies” that phrase, ropg^^ state 4 is also a sink (a so-called “closed stati 
like the ol d i n t he ol d book, by Corrpl emnt GWr oughl }[ n q j\ 4 r k ov term nol ogy) t hat 1 eads onl y t o s t at e 4 or 
man a phrase s argumnts , 1 i k e an 1 ce- creami n John %y a t e 2. These two st ates correspond t o t he 1 ocal rraxi rra 
an 1 ce-creamoi with envy in green with envy. There arg t t he head of figure 4 . ftnce t hi s syst emis not 

al s o 7 pos s i bl e words i n t hi s language. S, V, Ql e Qlrnable. I n addi t i on t o t hes e 1 ocal rraxi rra, the next 
CE, Adv, and Aux, correspondi ng to Subject, \ferb, C^ect i on bel ow s hows that there are in fact other states 
ject, Direct Cbject, Indirect Cbject, Adverb, (Miwhi ch t he 1 ear ner can ne ver r each t he t ar get. 

jective. There are 12 possible surface strings for each 

(-\2) graramr and 18 possible surface strings for 2 acl Der i vat i on of Transition Probabilities 
(+V2) grarmar if we restrict oursel ves to unerrbedded P _ , ... . „ . . 

or “degree-0” exanples for reasons of psychological pi aG r the Mrkov TLAStmcture 

si bi lity (see GWf or di s cus s i on). Not e t hat t he “s Thf hftftrput at i on of t he t r ans i t i on pr obabi 1 i t i es f r omt he 
s t r i ngs ” of t hes e 1 anguage s ar e act ual 1 y phras ess fiiifegiifege f ani 1 y can be conput ed by a di r ect ext e ns i on 
Subject, \brb, and Cbject. Figure (3) of GWsumra 0 f the procedure given in GW Let the target language 
ri zes the possi bl e bi nar y par amt er set ti ngs in t bfp (ftSflri s t of the stri pg^,s. . . , i . e. , 
tem For instance, paramter setting (5) corresponds to 

t he ar r ay [ 0 1 0] = Spe ci her first, Cbnp 1 as t, and —V2, t—{ s i ; %, •••) 

whi ch wor ks out t o t he pos s i bl e bas i c Engl i s h s urlfedcfeher e be a pr obabi 1 i t y di s t r i but i on P on t hes e s t r i ngs . 
phrase order of Subj ect-Verb-Obj ect (SVO). As shoi&uppose the learner is in a state corresponding t o t he 
i n GW s figure (3), the other possible ar r angemntfeaaiijguage f. Suppose it nowreceives the stjinjqts 
s ur f ace s t r i ngs cor r es pondi ng to t hi s par amt er switltli nlgi so wi t h pr obabi 1 i ty)T’('Bier e are two cases to 
i ncl ude SV; SV Ol C2 (t wo obj ects, as in jiiie John mBcani ne dependi ng upon whet her or not t he s t lj-iinqg s 
1 ce- crearty ; S Aux V ( as in John 1 s a nice guy, S Aux Vanal yz abl e by t he gr amrar cor r es pondi ng to t he cur r ent 
Q S Aux V Ol C2; Adv S V (where Adv i s an Adverb, paramter setti ng. 

1 i ke gut ckl y, Adv S VQ Adv S VOl C2; Adv S Aux V; Chse I. Suppos e t he 1 ear ner can s ynt ac t i cal 1 y anal yze 
Adv S Aux V Q and Adv S Aux VOICE. thereceivedstri qig By t he TLA, it wi 11 not c hange i t s 


3 



par amt er val ues . I n t he Mir kov chai n f or mil at i on, cfahenowbe gi ven as 

1 e ar ne r re rrai ns i n t he s am state. Re m rrbe r t hat t hi s 

P[s -s] =1 - 


state corresponds t o t he 1 anguage AL s o note that 
t hi s sit uat ion arises only yvhesniai t he 1 anguagg. L 
Therefore the probabi 1 i t y of the 1 earner re rrai ni ng i j 




; p \- s -&] 
k is a neighbor i ng s t at e of « 

given any paramter space wi t h n param- 


ters, we have n 21 anguages . Fixing one of the mas the 


s t at e s is 

Chse II. Suppose the 1 earner cannot s ynt act i cal 1 ^ language feTbTaTn t he Toll oTiTg prTcTdTr e7or 
a yze the string. ThrfiL,. By t he TLA the learner cons t r net i ng t he cor r es pondi ng Mir kov char n. Note that 
chooses a paramter at random} flips it, and if the™ is t he (^procedure for flndmg local imxiim, with 
paramter setting nakeji anal yz abl e , it retains t ^ ddi t j on of a probabi 1 1 ty masure on the language 
val ue and nx>ve s to the cor res pondi ng s t at e ; ot he r wi ^ e^i | 
r eimi ns i n i t s or i gi nal state s. Let us exani ne t hi s sit ua- 

ti on usi ng the Mr kov chai n for mil at i on. The 1 earner Fs( i ^ ss ^ & n di stri but i on) Fi rst fix a probabi 1 i ty raea- 

instates. It has n nei ghbor i ng s t at es each at a Hanni ng s ur e P on t he s t r i ngs of t he t ar get 1 anguage L 

di st ance of 1 f romi t sel f . The 1 earner pi cks one of t be^Eauiner at e states) Assignastatetoeachl anguage 
uni f orni y at random Iimgi ne t h^tofi these nei gh- i . e. , eacji L 

bor i ng s t at es cor respond t o 1 anguages whi ch cont ai n s / M i- u+u-i- ± ^ \t+ -i-ii 

T r , ^ /i-ir •( Nor mal l z e by t he t ar ge t 1 anguage.) Intersect all 

It t he 1 ear ner pi cks any one oi t h<s£©,t?es ( whi choi i • - i -i - - i - i, • p 

^ . J . . j . } . 1 anguages wi t h t he t ar get 1 anguage to obt ai n t or 

cour s e 11 does wi t h pr obabi Li nrm ,n l t voul dstayin l-^ii j _rr ^ r m • , i , , 

,, , , , T p j i i J - V p jn /, , each i, t he 1 anguagfe^L i C\L %. Ihus wi t h s t at e 

t hat s t at e. It t he 1 e ar ner pi cks any oltheotherstates • - i • - i i j • r + u 

, . , i i ■ i ■ . • zassociatedwithl angu^-geve, nowassoci ate the 

( wi t h pr obabi 111 y ( rui-fm) t hen 11 rermi ns l n s t at e s . ^ *- 

at i p * v j / f 1^11 „ - . . 1 anguage L 

Note that foi course coul d be 0 whi ch means that none 

of t he nei ghbor i ng s t at es woul d al 1 owt he s t r i ng t o be cfn( Take set di ffer ences . ) Now for any tw) states % 
alyzed. The nnxi mimval ue /icoul d t ake i s n. Thus ve and k, if they are mre than 1 Hamnng distance 

s ee t hat t he pr obabi 1 i t y t hat t he 1 ear ner r errai ns i n s t atB ar C t hen t he t r ans i t i on P[i —ifc] = 0 . I f t hey 
s is P($((n-ry)/n). The pr obabi 1 i t y t hat i t moves t o are^l ftmring distance apart then P[i -^k] = 
each of t he ot he9 st at es i s Ij§ /n) . p (^k\ -C) ■ 

a earl y t hi s al 1 ows us t o conpute the probabi 1 i ty t Haf s mdel captures the dynani cs of the TLA com 
t he 1 e ar ne r wi 11 re mai n i n i t s or i gi nal state s as Iplng teainy. 

of the probabilities of the above two cases, namly the, 
f 1 1 • Ikarrpl e. 

t oi 1 owi ng expr ess 1 on: r 


s j £L s 


p ( s j) + J 2 t 1 ~ n i/ n ) p ( 9 ) 

IS ZL, 


(insider again the 3-paramter systemin the pre¬ 
vious figure with target language 5. W can cal cul ate 
t he foil owi ng s e t di ffe r e nc e s to bui 1 d t he Mir kov figur e 
The above express ionis still alittle unt i dy becausVf fAhgifet f or TOr dl y. 

t he rtj ’ s i n i t. W woul dliketocleanit up alittle. Tij do^ 5 = 0 ( n o s t r i ngs i n c 011 mm be t weep find 
t hi s c ons i de r t he way we woul d c onput e t he t r ans ition target £) . 

pr obabi lity of state s to som other neighboring s fnt e .. „ v „ c v m ro 

say k in the chain. Fromthe above analysis, we see’ r 2 * ’ vni i, ’ 

t hat s uch a t r ans i t i on wi 11 occur wi t h pr obabi 1 it y 1 /n X ’ X ' 

for al 1 t he s t r i ]}gfshat ar e i n t he 1 anguagebit not 3. A OF 5=0- 

i n t he 1 anguage s L The stri ngs thensel ves occur wi th 4 nLs ={S V, S VO, S Aux V}. 
probabi lity f^jjseach and so the transi ti on probabi 1 i t ^ 

(l/n)P(f) 6 . AnL 5 ={SV, SVO, SVOICB 


S Aux V, S 


P[ s —>■ k] = 


S Aux V, S 


j € Lt ,5 j / € Lk 


4 nf 5 — {s v, 

Aux VO, S Aux V01 CB} 

7 kOL 5 ={ SV, Adv S V}. 

Sj £(1(04) \ Ls where \ i s t he set di ffer e nee s yrrbol . 8 - 4 fll 5 ={S V, SVO, S Aux V}. 

Fr omt he s e val ue s al one , we c an dr aw t he figur e i 11 us - 
trated, and find t he local rraxi rra. For exanple, since 
t he nor rrai izedstate set for state 1 is the e npt ys e t, t he 
set di ffe re nee betweenstates 1 and 5 gi ve s all of t he tar¬ 
get 1 anguage; so there is a (hi gh) transition probabi 1 it 
(l/n).P(js) fromstate 1 to state 5. Similarly, since states 7 and 8 

s har e s om t ar get 1 anguage stri ngs i n coiimm, s uch as 
S V, and do not s har e ot her s , s uch as Adv S and SVO, 

Si nee we have s hown t hi s i n gener al i t y wher e f or talmy 1 ear ner can mve f r oms t at e 7 t o 8 and back agai n. 
gi ven t ar get, we can conput e t he t r ans i t i on pr obabi 1 i Miay addi t i onal pr oper t i es of t he t r i gger i ng 1 ear ni ng 
bet ween any t wo s t at es i n t he Mirkov chai n f ormil atisiyistemnowbecom evident once the rrat he rrat i c al for- 
of the paramter space, the s el f - t r ans i t i on pr|)bHM i z ifyi on has been given. It is easy to imagine other 


Not e t hat t he above s umrat ionis done over all s t r i ngs 


It is easytosee t hat 

Sj E ( L t C\L ].) \ L s <=> Sj E ( L t C\L j,) \{ Lf C\L s ). 

Thus we canrewrite the tr ans i t i on pr obabi 1 i t y as 


P[ s —sfc] = 


£ 


(L t nL k )\(L t nL s ) 



al t er nat i ves to t he TLA t hat wi 11 avoi d t he 1 ocal wheat her any 1 ocal rraxi rra exi s t. Che coul d al s o 1 ook at 
i rra pr obi em For exanpl e , as it s t ands t he 1 ear ner otihyr i s s ues (1 i ke s t at i onar ityor ergodi city ass unpt i ons 
changes a parameter setti ng if that change al 1 owsthite ni ght potenti al 1 y affect convergence. Later we wi 11 
1 e ar ne r to anal yz e t he s e nt e nc e it c oul d not anal yE©ri$a-de r s e ve r al var i ant s t o TLA and see ho w t he s e can 
fore. If we r e 1 ax t hi s c ondi t i on so t hat i n t hi safel theaf or rral 1 y anal yz e d wi t hi n t he Mir kov f or mil at i on. 
ti on the 1 earner pi cks a parameter at randomto chaA§eyi 11 al so see that these vari ants do not suffer from 
t hen t he pr obi emwi t h 1 ocal maxi rra di s appear s , becaiuh® 1 ocal maxi rra pr obi emas s oci at ed wi t h GWs TLA 
t he r e c an be onl y 1 Abs or bi ng State, name 1 y t he tar geKe r haps t he s i gni He ant advant age of t he Mir kov c hai n 
gr amrar . A 1 ot her s t at es have exi t ar cs . Thus , byf onmil at i on i s t hat it al 1 ows us to al s o anal yze conver- 
rrai n t he or eng such a systemis lear nabl e . gence t i mes . Q ven t he t r ans i t i on nat r i x of a Mir kov 

Q - cons i der for exampl e t he pos s i bi 1 i t y of noi s e-ethhatn, t he pr obi emof how 1 ong i t t akes t o conver ge has 
is, occasionally the learner gets strings that arberaimtwehl studied. This questionis of crucial importance 
t he t ar get 1 anguage . GWs t ate (fn. 4, p. 5) t hat itrhl ear nabi 1 i t y. Fol 1 owi ng GVy we bel i e ve t hat i t i s not 
is not a problem} the learner need only pay at t entmomgh to showthat the learning problemis const st ent 
to frequent data. But this is of course a seri ousipeol}- that the learner will converge to the target in t h< 
1 em f or t he model . till ess s ome ki nd of me nor y olri nit. Walsoneedtos how, as GWpoi nt out, t hat t he 
f r equency- count i ng de vi ce is added, t he 1 ear ner clamanti ng pr obi emis f east bl e, i . e . , t he 1 ear ner wi 11 conver 
know whet her t he exanpl e it receives is noi s e or inmCr eas onabl e” t i me. Thi s i s par t i cul ar 1 y t r ue i n t he cas e 
This being so, then there is always some finite probafinite parameter spaces where consistency night not 
bi 1 i ty, however snal 1 , of escapi ng a 1 ocal maxi mirhe much of a probl emas f easi bi 1 i t y. The Mir kov f or- 
appe ar s t hat t he i de nt i fic at i on i n t he 1 i ni t f r ame wamH at i on al 1 ows us to at t ac k t he f e as i bi 1 i t y que s t i on. II 
gi ven i s si npl y i nconpat i bl e wi t h t he not i on of narks® ,al 1 ows us to cl ar i f y t he ass unpt i ons about t he be- 
unl ess a me nor y wi ndowof s ome ki nd i s added. havi or of dat a and 1 e ar ner i nher ent in s uch an at t ack. 

W nay now pr oceed to as k t he f ol 1 owi ng quest i orW begi n by cons i der i ng a f e w ways i n whi ch one coul d 
about t he TLA mo r eprecisely: for mil at e t he quest i on of conver gence t i mes . 


1. Ebes it converge? 3. 1 Son® TFansi ti on Mtri ces and Their 

2. Howfast does it converge? Howdoes this varywith Gbnvergence Qirves 

di s t r i but i onal as s unpt i ons on t he i nput exampl^g^ ? us p e g} n by f ol 1 owi ng t he pr ocedur e det ai 1 ed i n t he 

3. Clan we nowconput e t he dynani cs for ot her “nat upr evi ous sect i on to act ual 1 y obt ainafewtr ansi t i on na- 

r al ” par amet er sys t errs , 1 i ke t he 10- par amet er tgrys&es . Cbnsi der t he exanpl e whi ch we 1 ooked at i nf or- 
temfor the acqui sitionof stress ini anguage s nteVfe^-i n t he previ ous secti on. Here the target grannar 
oped by [ 4] ? was grannar 5 and t he L 1 anguages have al ready been 

, -it • a ctta ij j i ,, , obtai ned. For s i npl i ci t y, let us first ass ume a uni f or m 

4. Var l ants ot ILAwoui d cor r es pond t o ot her IVar kov . G 

. . n. ,i o t r i p.Mistri but i on on t he s t r l ngs ini , t he pr obabi f 11 y t he 

structures. Lb t hey conver ge : If so, howt as t ; , . . J 


learner sees a parti cul ar s^riim^ga s 1/12 because 

5. Howdoes the convergence t i me seal e up wi t h t|ig ere are 12 (degree- 0) stri ngs iW £an nowcom 

number of parameters? pute the transition matrix as the following, where 0’s 

6. Wat i s t he comput at i onal compl exi t y of 1 ear ifihfgupy nat rixentries if not ot her wi s e s peci fied: 
par amet r i zed 1 anguage f ani lies? 


7. Wat happens if we move from on-line to batch 
learning? Chn we get PAG s t yl e bounds [6]? 

8. Wat does it me an t o have non- s t at i onar y ( none r - 
godic) Mir kov structures? Howdoes this relate to 
assumptions about parameter ordering and nat u- 

r ation? 

9. Wat other par amet r i zati ons can we consider? 


L 1 
L'J 
C3 

u 

L 5 

Le 


I n t he re nai nde r of t his paper we s hal1 c ons i de r t he s e 
and other questions. W turn first to the question of 
c onve r ge nc e and c onve rgence times. 


Li L 2 L 3 L 4 L 5 L 6 L 7 Lg 


12 


11 

12 


12 


36 


Gbnvergence Times for the Mrkov 
Chai n Mdel 


Notice that both 2 and 5 correspond to absorbing 
states; thus this chain suffers from the local maxi rra 
problem Note also (following the previous figure as 
The Mir kov chain formulation gives us some di s t i raet 1 ) that state 4 only exits to either itself or to stat 
advant ages intheoretically c har acterizing the 1 a2}grbB^pc e is also a local maxi mim M>re precisely, if T 
acqui si ti on probl em Fi rst, we have al ready seeri hefihe transi ti on pr obabi 1 i ty rnatri x of a cha^qi, then t 
gi ven a Mir kov Chai n one coul d i nves t i gat e whet heriore . t he el erne nt of T i n t he i t h r ow and j t h col umn i s 
not it has exactly one abs or bi ng s t at e corresponditrij^tprobabi 1 i ty that the learner moves fromstate i to 


t he t ar get gr amrar . Thi s i s equi val ent t o t he que^.t isdrabf j i n one step. It is awell - known f act t hat i f one 



consi ders the correspondi ng i , j el enfiit hrfnThi s i s not cl ear , presumabl y t he i ssue of 1 earnabi 1 i ty even i r 
is the probability that the learner neves f romsttaltes 3i-parameter case deserves r e-exani nat i on i n 1 i ght of 
t o s t at e j in m steps. For 1 e ar nabi 1 i t y to hoi d i rtrfeisspfxDS s i bi 1 i t y. 

t i ve of whi ch s t at e t he 1 ear ner s t ar t s in, t he pr obabSB’iitgns 1 y one can exani ne ot her det ai 1 s of t hi s par- 
t hat the learner reaches state 5 s houl d tend to 1 fiscal ar system However , let us nowl ook at a cas e where 
goe s t o i nhni t y. Thi s me ans t hat c ol unn JPcdhiJhl d t he r e i s no 1 o c al maxi rra pr obi e m Thi s is t he c as e whe n 
containall l’s, and t he mat r i x s houl d cont ai n 0’s feherlyar get languages have verb-second (V2) movement 
whe r e e 1 s e . Act ual 1 y we find t heft cBnve rges to the in GW s 3- par ame ter case. Cbns i de r t he t r ans i t i on na- 
f ol 1 owi ng nat r i x as mgoes t o i nhni t y: t r i x obt ai ned when t he t ar get 1 anguajje io^gdi n we 

as s ume a uni f or mdi s t r i but ion on s t r i ngs of t he t ar get. 


Exani ni ng t hi s nat r i x we see t hat i f t he 1 e ar ne r s t ar t s 1 18 18 

out instates 2 or 4, it will certainlyendupinstat eH&ienwe find t hat I™ does i ndeed conver ge t o a nat r i x 
t he 1 i ni t. Ihese t wo s t at es correspond to 1 ocal nasti hh 1’ s i n t he Hr s t col unn and 0’s el sewher e . Cbnsi der 
gr amnar s i n t he GWf r amewor k. If t he 1 ear ner s t ar t stliB first col unn of"T It is of t he f or m 
either of these twostates, it wi 11 never reach the target. 

Fr omt he nat rixwe also see t hat i f t he 1 e ar ne r s t ar t s i n Pi ( rti) 

s t at es 5 t hr ough 8, it will certainly conver ge i n t he 1 i ni t P’zirn) 

to the target grammar. P 3 (rn) 

The s i t uat i on r egar di ng s t at es 1 and 3 i s mor e i nt er- P4 (^ 

esting. If t he lear ner starts in eit her of these states, it P 7 >{rn) 

wi 11 r each t he t ar ge t gr amnar wi t h pr obabi 1 i t y 2/3 and P6(rn) 

r each s t at e 2, t he ot her abs or bi ng s t at e wi t h pr obabi 1 i t y P7(rr>) 

1/3. Thus we see that local maxi rra are not the only L Ps(rn) . 

problemfor 1 ear nabi 1 i t y. GW(p. 26 in nanus c r i ptjjg r e ^ denotes the probability of being in state 1 
focuses excl usi vel y on 1 ocal naxi na, and i ndi rect^y tBfe end of mexanpl es i n t he case where the 1 earner 
plies that these are the onl y di fffcul t states: “n$stWtf e d i n s t at e i . Nat ur al 1 y we want 
the source grammars have local triggers that enable the 

learner to get t o t he target. . . however, there exist pairs lim Pi(rn) =1 

of source and tar get grammars fromthe parameter space 

given in the table in Figure 3, such that no data this example this is indeed the case. The next 

the target gr amnar wi 11 ever s hi f t t he 1 e ar ne r out Sfififife s hows a pi ot of t he f ol 1 owi ng quant i t y as a f unc t i on 
source grammar . . . There are six such pai rs of sour@S Pfe-the number of exampl es . 
cal naxi mum and target grammars” They then go on / \ • r / 

to list in their figure 4, t wo s uch 1 ocal naxi na for t he p(n) -in n{p 8 ( n) } 

t ar ge t gr amnar 5, c or r e s pondi ng to states 2 and 4. The quant \ ty p(rn) is easytointerpret. Thus p(rn) = 

Wi lethis statement is strictly true, it does hbt9 Searae ans t hat for e ve r y i ni t i al s t at e of t he 1 e ar ne r t he 
haust t he s et of s our ce states t hat never 1 ead t o t he fmoj^abi 1 i t y t hat it is i n t he t ar get s t at e af t er raexam 
gr amnar . As we s ee f r omt he t r ans i t i on nat r i x, whpl es is at least 0. 95. Fur t her t her e is one i ni t i al s t at e (t 
it is t r ue t hat s t at es 2 and 4 wi 11 , wi t h pr obabi kbt yt li,ni t i al s t at e wi t h r es pect t o t he t ar get, whi chin our 
not converge to the t arget grammar , it is al so t r ue etxfcaatpl e is for whi ch t hi s pr obabi 1 i ty i s exact 1 y 0. 95. 
s t at es 1 and 3 wi 11 not conver ge to t he t ar get. ThusWtfl®d on 1 ooki ng at t he cur ve t hat t he 1 e ar ner con- 
number of “bad” i ni t i al hypot hes es is si gni ficant 1 y Forges - wi t h hi gh pr obabi litywithin 100 to 200 (degree-0) 
than that presented i n Fi gure 4 of GW Thi s di fferencexampl e sentences , a psychol ogi cal 1 y pi ausi bl e number . 
agai n due t o t he newprobabi 1 i sti c framework i ntrod(i£fed can nowof course proceed to exani ne actual tran- 
i n t he c ur r e nt paper, and in fact is related to ths odiifpt s of c hi 1 d i nput t o c al c ul at e c onve rgence times for 
cul t y f ound ear 1 i er wi t h t he cent r al conver genee (Saofifial ” di s t r i but i ons of exampl es , and we ar e cur r ent 1 y 
1 ooki ng j us t at ni ni nal pat hs and cycl es i n f act nfiagaged i n t hi s effor t. ) 

some possi bl e 1 earni ng paths . I n t he appendi x of t hi s e^a-one example of the power of this approach, we 
per, we pr o vi de a c ompl e t e 1 i s t of all s t ar t i ng s t at asamha aiipar e t he c onve rgence t i me of TLA to ot he r al - 
ni ght res ul t i n non-1 earnabi 1 i ty. Wi 1 e t he i npl i c ^arciil <hfis . Perhaps the simplest is random walk: start 
t he exi s t ence of addi t i onal non- 1 ear nabi e s t ar t i^n^haet hteasner at a r andompoi nt i n t he 3- par amet er s pace , 


and then, if an i nput sentence cannot be analyzed, isrevstion. This rratrix has non-zero elements (transition 
r andoni y f r oms tate tost at e. Not e t hat t hi s r egi mepraafeabi lities) exactly wher e t he ear 1 i er rrat r i x had non- 
not suffer from the local rraxi rra problem; since th®«EO elements. However, the value of each transition 
i s al ways s ome Rni t e pr obabi 1 i t y of exi t i ng a non- IpatD^albi 1 i t y now depends upon a , b , c , and d. In par t i cu- 
state. 1 ar i f we choose a =1/12, b =2/12, c =3/12, d =1/12 

To s at i sf y the reader ’ s curi osi ty, we provi de th(ethim-i s equi val ent to ass uni ng a uni f ormdi stri but i on) 
ver gence cur ves for a r andomwal k al gor i t hm( RV%) orae obt ai n t he appr opr i at e t r ans i t i on mat r i x as bef or e . 
the 8 state space. W find that the convergence tiifasoking nore closely at the general transition matrix, 
are act ual 1 y f as t e r t han for t he TLA; see figure 2. n$ee t hat t he t r ans i t i on pr obabi lity fromstate 2 to 
t he RWli s al s o s uper i or i n t hat it does not s uffer fs-fcailie 1 i s (1 — ( a +6 +c)) / 3. Clearlyifwe make a ar bi - 
t he s amt 1 ocal maxi ma pr obi emas TLA, t he concept uaflr arilyclose to 1, thenthis tr ans i t i on pr obabi 1 i t y i s ar 
support for the TLA i s by no means clear. Qf courinprily close to 0 so that the number of samples needed 
i t nay be that the TLA has empi ri cal support, i n ttlne converge can be made arbi trari 1 y 1 arge. Thus choos- 
s ens e of i ndependent e vi dence t hat chi 1 dr en do us ei h^ilsar ge val ues f or a and s nal 1 val ues for b wi 11 r es ul t i i 
pr ocedur e (gi ven by t he pat tern of their err or s, et d.a)r,gfeutonver gence t i mes . 

this evidence is lacking, as far as we know. This means that the sample complexity cannot be 

Nowt hat we have made a Hr s t at t empt t o quant i f y t hbounded in a di s t r i but i on- free sense, becaus e by choos- 


convergence ti me, several other questions can be r ahgesl. hi ghl y unf avor abl e di s t r i but i on t he sample com 
How does convergence time depend upon t he di s t r i bp! exi t y can be made as high as possible. For exam 
t i on of t he dat a? How does it c onpar e wi t h ot he r ki pfe , we now gi ve t he c onve r ge nc e c ur ve s c al c ul at e d f or 
of Mirkov structures with the same number of s t at eMlfer ent choices of a, b, c, d. W see that for a uni- 
How wi 11 the convergence time be affected if the mfnsr mdi s t r i but i on t he conver gence occur s wi t hi n 200 s am 
ber of s t at es i ncr eases , i . e t he number of par amet (pises m- By choosi ng a di s t r i but ion with a = 0. 9999 and 
creases? How does i t depend upon the way i n whi c&i = c = d = 0. 000001, the convergence ti me can be 
the parameters relate to the surface stri ngs? Ar e ptiMKSi up t o as much as 50 mi 11 i on s anpl es . ( Qf course, 
other ways to characteri ze convergence ti mes? W hcMs di stri but i on i s presumabl y not psychol ogi cal 1 y real 
proceed to answer some of these quest i ons . i st i c. ) For a = 0.99, b = c = d = 0.0001, the s anpl e 


3.2 Distributional Assumptions 


conpl exi t y i s on t he or der of 100, 000 pos i t i ve exanpl es . 


Inthe earlier sectionwe ass umed t hat t he dat a was 3u3 - Tbsorpti on 'll n®s 

f or mi y di s t r i but ed. W conput ed t he t r ans i t i on natrix . ; . ; . ; . . ; . 

o , • , , , , ,, I.,. Inthe previous sections, wee omput e d t he t r ans 1 1 1 on na- 

t or a par 1 1 cut ar t ar get 1 anguage and s howed t hat conver - . 1 ; ; .. . 

, • p.h i r i nn onn i , tnx tor a var i ety ot di stri but 1 ons and showed the rate of 

gence 1 1 mes wer e ot t he or der ot 100 - 200 s anpl es.lnthis T J ; . 

, • i . i . . i , • i iC onve rgence. In particular we plotted p m, theprob- 


6 , • i , i , , i , • j jC onve rgence. In particular we plotted »(m), (theprob- 

secti on we showthat the convergence ti mes depend crm. ° 1 1 r 7 fi . . , . , 

• i, ,i i. , • i , • T , • , abi 1 i t y ot conver gi ng t r omt he nus t unt avor abl e l m 1 1 al 

ci al 1 y upon t he di s t r l but ion. In par 1 1 cul ar we can cTuopse-' •,/?, , r . TT , , • 

• j-, - 1 , i .-state agai ns t ml t he number ot s anpl es . However, t hi s 

a di s t r l but l on whi c h wi 11 make t he c onve rgence tint as ; ; ° . v ; . ; . ' ’ . 

i j-m . i i • . • i . • r is not the only way toe har acterize c onve rgence times, 

large as we want. ihus t he di s t r l but l on- t r e e c onve rtence . . ; . J ; ; . ; . ; . ; ° . ; . . 

,. p ,i o , , • aven an initial state, the time t aken t o r each t he ab- 

ti me tor the 3-par anr ter s ys t e mi s l nhm t e. ; . , , /, , , , , • , . . . 

4 ,p • , ,i • , , • , ,i , sorption state known as t he absorption tin* is a r an- 

As bet ore, we cons l der t he sit uat l on where t he t au - get . . . „ v , , i t J ■ 

i T m ii- ii domvariable. the can compute the mean and variance 

1 anguage l s i L ihere are no 1 ocal maxi mu problems . , . . . • , , „ . , , ; , 

o . i • i • tit i • i i . . • .i , , ot t hi s r andomvar l abl e. lor t he cas e when t he t ar get 

t or t hi s choi ce . W begi n by 1 et 1 1 ng t he di s t r l but.i on lie . . . ; . ; .° 

, • ii.i -ii i i j 1 anguage lsTwe have s e e n t hat t he t r ans 1 1 1 on mat r l x 

par amet ri zed by the vari abl es a, b , c, d where . ° ° ^ 

has the form 

a = P( A ={Adv VS}) / 1 q \ 

b = P(B ={Adv VOS, Adv Aux VS}) T= [ R Ol 

c = P(C = {Mv VOl C2 S, Adv Aux VOS, V 

Adv Aux VOl CB S}) Here Q i s a 7- di mensional square matrix. The mean 

d = P( D ={V S}) abs or pt i on t i mes f r oms t at es 2 t hr ough 8 i s gi ven by t he 

Thus each of t he sets A, B, C and D cont ai n di ffer efif or ( s ee I s aacs on and Mds en [ 3] ) 
degree- 0 sentences pf IT earl y t he probabi 1 i t y of t he _ 1 

set k \ {AUBUFUI} i s 1 -(a +b +c +d) . The ^ ={ I ~Q) 1 

elements of each defined subset phrT equal 1 y 1 1 kel lis a 7-di rams i onal column vector of ones. The 

with respect to each other. Set 1 1 ng pos 1 1 1 ve val of seC ondra)ramts is given by 

a, b , c, a such t hat a +b +c +d <1 no w de hne s a uni que 

pr obabi 1 i t y f or each degr ee ( 0) s ent epceFim dxam ^ =( I —Q) _1 ( 2p —1) 

pi e , t he pr obabi 1 i t y of AdvV OS is 6/2, t he pr obabi 1 i t y of 

AdvAuxVOS is c/3, that of VOS'is (1 —( a +b +c +d )) / (Us i ng this result, we can now compute the mean and 

and s o on. s t andar d de vi at i on of t he abs or pt i on t i me f r omt he nus t 

W c an no w obt ai n t he t r ans i t i on mat rix corres poimahf avor abl e i ni t i al state of t he learner. ( W note t hat 
i ng t o t hi s di s t r i but i on. Thi s i s s hown i n Tabl e 1. t he second raiment is fairlyskewedinsuchcases and s o 
Cbmpar e t hi s mat rix wi t h t hat obt ai ned wi t h a uriis- not s ynmet r i c about t he mean, as nay be seenfr om 
f or mdi s t r i but ion on t he s ent enceg iof tlhe ear 1 i ^r t he pr e vi ous cur ves . ) 



Le ar ni ng 
s cenar i o 

Man abs . 
t i me 

St. Etev. 
of abs . tine 

TLA (uni form) 
TLA ( a =0. 99) 
TLA( a =0. 9999) 

m 

34. 8 
45000 

4. 5 xld 3 

9. 6 

22. 3 
33000 

3. 3 xltf 

10. 1 


3.4 Eigenvalue Rates of Cbnvergence 

In classical Mrkov chain theory, there are 
known convergence 


be represented as a subset *of .S. 

U — i\ , U}2 j 

The 1 e ar ne r is pr ovi dedwith positive data (strings t hat 
belong to the language) drawn according to distri bu¬ 
st rings of a particular target language, 
t o i dentif y t he t ar get. 11 is quit e pos si bl 
that are in nore than 
learner will not be 

e t o uni que 1 y i de nt i f y t he t ar ge t. Ho we ve r , 


tion P on t he 
The 1 earner 


l s 


as nor e 


that the learner receives strings 
g qO^i | anguage . In such a case the 

t he or e ns de r i ve d f r om a c ons i db ^ 

at ion of the eigenvalues of the transition imt r i £ nd W )r e dat a beconts aval 1 abl e > t he pr obabi 1 1 1 y of hav- 

state without proof a convergence result for t r anW«i r <5h e 1 ved onl ^ antl 8 1 ous strl n 8 s be corns s iml 1 er and 

imt rices s t at ed i n t er ns of its eigenvalues. snnfler and eventual 1 y the 1 earner to 1 1 be abl e t o i dent i f y 

Theorems Let T be an n xn transition matrix with 
n linearly independent left eigenvectors x.q cor¬ 
responding to ei genval ues\,\. . , n .\ Let (an n- 

di rrtnsi onal vector) represent the starting probabil itf e ^f. nej . j s still arrbi gi ous about the target is less than 6 
being in each state of the chain and tt be the limit rgt e following t heor empr ovi des a lover bound. 
probabil ity of being in each state. Then after k transi¬ 
tions, the probabtl tty of being in each stfuiecmi be Theorem4 The learner needs to draw at least M — 
describedby 1Tax i/=hn('i/p,') 1 n (l/b) sarrpl es ( uherf pP( L t (1L j)) 


i vent ual 1 y t he 1 < 
t he t ar ge t uni que ly. Aninteresting que s t i on t o as k t he n 
i s howrrany s anpl es does the learner needtosee sot hat 
wi t h hi gh c onhde nee it is abl etoidentifyt he target, i.e 
t he pr obabi litythat after seeingt hat rrany s anpl e s , t he 


x 0 T k ^r ||=|| ^ Afx 0 y jXj 


i=l 


ll<™ x I A-1‘ 

2 <i<n 


£ 


xoYiXi 


] ln(l/ P p 

i n order to be abl e t o i dent if y t he t arget with confidence 
greater than 1 —6 . 


i=2 


where then’s are the right eigenvectors of T. 


This theoremthus bounds the rate of convergence^.^ 1 earner* 


Proof. Suppose the learner draws m(less than 

M) s anpl es . Let k = ar g rra^/ j>j. Thi s me ans 1) 

M= i 1 n( 1/8 ) and 2) that with probability p 

.. receives a string which is i ij hotlh L 

the limting distribution tt (in cases where there ^. on tence it will be unable to discrimnate between 

one absorption state, tt wi 11 have a 1 cor r es pondi pg e t p ar get t he t he h 1 anguage . After drawing msam 

that state and 0 everywhere else), hing this resi^^e t he prob ability that all of thembelong to the set 
can now bound the rates of convergence (in terns jof nijfe ls In such 

a case even after seeing m 

nunbe r k of s anpl e s ) by: s a npl e s ; t he 1 e ar ne r wi 11 be i n an anbi guous s t at e . Now 

( Pk) m > ( Pk) M si nce m < M and pt < 1- Fi nal 1 y 

since Mln(lfcj) = ln((l/g) M ) = 1 n( 1 / <5 ) , we see that 
(Pk) m > h . Thus t he pr obabi 1 i t y of bei ng anbi guous af - 


Learni ng s cenar i 

(Hate of Cbnvergence 

TLA (uni form) 
TLA( a =0. 99) 
TLA( a =0. 9999) 

m 

0(0. 94) 

0((1 -10 -4 ) fc ) 

0((1 -10“ 6 ) fc ) 

0(0. 8$) 


ter mexanples is greater t han 6 which means that the 
c onhde nce of beingabletoidentifythe target is less t han 
1 - 6 . I 

Thi s t he or e mal s o he 1 ps us t o s e e t he c onne c t i on beQii s s i mpl e r e s ul t all ows us to as s e s s t he nunbe r of 
t ween t he nunbe r of exanpl es and t he nunbe r of pa-S anpl es we needto dr awi n or der t o be c onhde nt of cor- 


r amet er s s i nce a chai n with n states (cor r es pondi y i dent i f yi ng t he t ar get. Not e t hat if t he di s t r i but i < 

an n xn t r ans i t i on rrat r i x) r epr es ent s a 1 anguage f a<M ltjhe dat a is very unf avor abl e , t hat is, t he pr obabi 1 i t y 
wi t h 1 og( n) parameters. of receiving anbi guous strings is quite high, then the 

nunber of sanples needed can actually be quite large. 

4 Batch Lear ni ng Upper and Lowr Wile the previous t heor empr ovi des the nunber of s am 

Bounds ' Al i de pi es necessary t o i dent i f y t he t ar get, t he f ol 1 owi ng t heo- 

r empr ovi des an upper bound for the nunbe r of sanples 

So f ar we have discussed a me nx>r yl e s s learner nx> vititgit ares uffic i ent to guar ant e e i de nt i he at i on wi t h hi gh 
f r oms tate tost at e in par amet er s pace and hopef ul 1 yCiPcffijjide nc e • 

verging to the correct target i n hni t e time. As If the learner dram rwre than M = 


Mir kov f or mil at i on. In l 


this was well-modeled by our _ 

this section however we step back and consider up^jf^ 1 . - ^) 
and 1 owe r bounds for 1 e ar ni ng hni t e 1 anguage f ani 1 .. 

the 1 earner was al 1 owed to remember al 1 the stri ngs W ' ) • 

countered and opti ni ze over them Needl ess to say tffebe/. Cbnsi der 


ln(l/b) samples, then it nill 
II ifth confidence greater than 1 —6. 


i dentify the tar- 
( Herp ic 


Any el e- 


t he set L =t % U f / =Lj 

ni ght not be a ps ychol ogi cal 1 y pi aus i bl e ass unpt i orrpehhtof t hi s set is pr es ent i n t he t ar get 1 ap^iuige L 
it can s hed 1 i ght on t he i nf or rrat i on- t heor et i c conphetditryany ot her 1 anguage . Cbns equent 1 y upon r ecei vi ng 
of t he 1 e ar ni ng pr obi em suchastring,thelearnerwillbe abl e t o i ns t ant 1 y i de n- 

Cbnsider a situation where there are n 1 anguatgjelsy the target. After m>Msanples, the probability 
Li, h, ■ ■ n lover an alphabet E 


Each language c^ail hat the learner has not recei ved any menber of this set 


is (1 — P( Lyj = (1 — & t) m < (1 — b t ) M = 6 . Hence state if the newsentence is anal yz abl e. Otherwise the 
t he pr obabi 1 i t y of s eei ng s one ntnber of Lint hos 4 oar ner neves uni f or ni y at r andomt o any of t he ot her 
s anpl es is greater t han 1 —6 . But seeingsuchant mbstrat e s and s t ays t he r e i ff t he s e nt e nc e c an be anal yz e d. 
e nabl es the learner to identifythe target so t hel (ortelte s e nt e nc e c annot be anal yz e d i n t he ne w s t at e t he 
ability t hat t he 1 e ar ne r is abl e to i de nt i f y t he tlau^gieite i sr e rrai ns i n i t s or i gi nal state. 

gr e at e r t han 1 —6 if it dr aws ner e t han Ms anpl e s . I Fi g. 4 s hows t he c onve r ge nc e t i me s for t he s e t hr e e al - 
To s urarar ize, this section pr ovi de s a s i npl e uppari t hm whe n £ i s the target 1 anguage . I nt e r e s t i ngl y, 
and 1 over bound on the s anpl e conpl exi t y of exact i dfijil three perf ormbet t er than the ILAf or t hi s t ask. Fur- 
t i Heat i on of t he t ar get 1 anguage f r ompos i t i ve dat GheEhit hey do not s uffer f r oml ocal rraxi rra pr obi era. It 
6 parameter that measures the confidence of t he 1 eaisitaul d be poi ntedout, however , that the di fferences from 
of being able to identify the target is suggest i iTLAfarae nargi nal and t hi s convergence has been shown 
PAC [ 6] formulation. Howe ver t her e i s a cr uci al didfpfiy f or jLas t he t ar get 1 anguage . I deal 1 y the conver- 
ence . I n t he PAC f or mil at i on, one is interestedin gamee r at es have t o be comput ed for each t ar get 1 anguage 
appr oxi mat i on t o t he t ar get 1 anguage wi t h at 1 eas t 4ne£t hen ei t her a wor s t cas e or aver age c as e r at e s houl d 
confidence. In our case, this is not so. Since w artenMci ded upon to characterize the convergence times 
al 1 owe d t o appr oxi mat e t he t ar ge t, t he s anpl e c onpf ©X-1 he al gor i t hmon t he 1 anguage f ani 1 y as a whol e . 
ity shoots up wi t h choice of unfavorable distributions. 

There are some interesting directions one could Follow:_i „ •___ _ j ;___ a 

... ... .. r „ . .q vx»ncl usi on, Upen CJuesti ons, and 

within this batch learning framework. Che could try ; 

to get true PAG style di s t r i but i on-f r ee bounds for var*iU't ul: ’ e Directions 

ous ki nds of 1 anguage f ani lies. At er nat i vel y one ,conl d . . 

, , , , , c As, t he ,n,unbe r ot parameters nine re as es, the size ot the 

use the exact l dent l heat l on res ul ts here t or 1 1 ngui sti call y .. , . . „ ’ . ; . 

, -iii r • i • • . i tt iij) GODrespondi ng ivarkov matrix grows as Thus l n t he 

pi aus l bl e 1 anguage t am 1 1 es with reasonable proioabil-T & & 

• , i- . • i . • ,i i , ti - 1 . 1 . 1 . • , case of a 10 par amet er s ys t emas I ound l n nude I s ol Fn- 

1 1 y di s t r l but l ons on t he dat a. 11 m ght be an l nt er esd l ng G r ., . J 

, , ,i i i r i alii s h s t r e s s ( 4 ) t he c or r e s pondi ng Ahr kov s t r uc t ur e wi I 1 

exercise to r e comput e t he bounds t or cases wher e ft He . . _ . . „ V „ L . J J & , , , , • 

, • I., • , • j . • i . ifie a, JU/4 x KJ/4 nat r l x. W are cur r ent I y conduct l ng 

learner receives bothpositive and negat l ve dat a. Ii nal ry . r ,., „ . / . . 

.11 i i . • j, iii i ip. ,an anal ys l s ot t hi s I ar ger s ys t emt o hnd its I ocal maxi rra, 

t he bounds obt ai ned here coui d be s har pened 1 ur t her .. J J ... ’ 

w • , i , , , • , , • • anal yze its c onve r gene e 1 1 mes , and see it its c onve r gene e 

W l nt end to I ook l nt o s ome ot t hes e ques 1 1 ons l n The J . , ° . , ; . .A 

future ti mes correspond to what one m ght hnd inpractice with 

real stress system. 

5 Vari ants of the Learni ng Mdel Addi ti onal questions renain to be answered. Che i s- 

sue has to do with the smoothness relation between 

W have so far focused on t he TLA s cheme for 1 ear it-he par amet er set t i ngs and t he r esul t i ng surface s t r i ngs . 
ing. TLA observes the si ngl e val ue and greedi ness tunpri nci pi es- and- parameters theory, it has often been 
strai nts . There coul d be several vari ants of t hi s lseiaggaaitged that a snal 1 parameter change coul d 1 ead to 
al gori thmand nany of these are captured conpl et<sl J^arge de due ti ve change i n the grannar , hence a large 
by our Mr kov f or mil at i on. W consider t he f ol 1 owfcfrgnge i n t he s urf ace 1 anguage gener at ed. I n al 1 the ex- 
three si npl e vari ants by droppi ng ei ther or both ofintphes consi dered s o f ar there i s a smooth rel at i on be- 
Si ngl e Ml ue and Geedi ness cons t r ai nt s : t ween s urf ace s ent ences and par amet er s , i n t hat s wi t ch- 

i ng from a V2 t o a non-V2 systenq for instance, leads 
Ifendomwal k w t h nei t her greediness nor single us t o a Mr kov state that is not too far awayfromthe 
value constraints: W have al ready seen t hi s examp rev j ous one jf this is not so, it is not so clear t hai 
pi e bef or e . The 1 ear ner is i n a par t i cul ar s t at e -t h6PTEA wi 11 wor k as bef or e . I n f act, t he whol e ques- 
receivi ng a newsentence, it remains i n t hat s t at e how t o f or mil at e t he not i on of “smoothness” in 

sentence is analyzable. If not, the learner noveg FSfiguage-gr amrar framework is unclear. W know 
forniy at randomto any of the other states and stjays^Fe case of continuous functions, for example, that 
there wai ti ng f or the next sentence. Thi s i s done wijt|iq-ipf e 1 earner i s a ll owed to choose exampl es (whi ch can 
r egar d to whet her t he ne w s t at e al 1 ows t he s ent encg e tgj a f ed by s el e ct i ve at t ent i on) t hen s uch an “ac- 
De anal yze a. t i ve” 1 ear ner can appr oxi nat e s uch f unct i ons much nor e 

Rmdomwilk with no greedi ness but with single quickly than a “passive” learner, like the one presented 

wine constraint: The learner remains in its or i gibfopW 1 s there an anal og t o t hi s i n t he discrete, digital 
state if the newsentence is analyzable. Other™ s^W of language? Howcan one approximate a 1 an- 
1 earner chooses one of the parameters uni f orni y atSftM- e? ftre to ° ™thenatics nay play a helpful role. 

r T-* 1 1 J 1 J J 1 • 1.1 Pi* 1 1 


domand Hi ps i t thereby novi ng to an adj acent state%fi al 1 1 hat therels an anal og t o a f unct l onal analysis 
the Mr kov structure. Again this is done without r egUfag ua 8 es ^ aml T t he algebraic approach advanced 


to whether the newstate allows the sentence to be and Schut zenberger ([ 5] ). I n t hi s model, a 

lyzed. However since only one parameter is changei %P ua 8 e 18 described by an (infinite) pol ynom al gener 
a tine, the learner can onl y move to neighboring stlto function, where t he coeffii ent s on the pol ynom al 
at any gi ven t i me term* gi ves t he number of ways of der l vi ng t he s t r l ng 

x. A(weak, string) approxi mat i on t o a 1anguage can 
Rindomwal k wi th no si ngl e val ue constrai nt but then be defined in term of an approximation to the 

wi th greedi ness: The learner remains in its ori eigiiaher at i ng function. If this method can be deployed. 



t hen one ni ght be abl e to car r y over t he r es ul t s of f unc-gr amrar i s (VC6- V2) . For cas es when t he t ar- 
t i onal anal ys i s and appr oxi rrat i on f or act i ve vs . pas s i get is 1 ear nabl e , t he 1 ear ner conver ges to t he t ar get 
1 ear ner s i nt o t he “di gi t al ” dorrai n of 1 anguage . If thin 100- 200 s anpl es wi t h hi gh ( gr eat er t han 0.99) 
i s possi bl e , ve woul d then have a very powerful set ofprobabi 1 i t y. Further , the vari ants of the TLA al 1 
pr e vi ous 1 y under ut i 1 i zed rrat he rrat i c al t ool s t o anal yzaut per f or mt he TLAi n t er ns of conver gene e t i ms . 

1 anguage 1 e ar nabi 1 i t y. 
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9217041-ASC and ARPAunder the HPCCprogram 


A Lear nabl e Granmrs: The Fill 1 Story 

A 1 Pr obi emSt at es 

W provide in Table 2 a conplete list of problems tat[e g ], N Chonsky and M Schutzenberger , The Age- 
In ot her wor ds we 1 i s t al 1 the initial starting gr ammg ra i c iheor y of tint ext - fr ee Languages . Cbnputer 
target graramr pairs for whi ch the learner is not guar ftograrali ng and Foriml System, North Hoi 1 and, 
ant eed t o conver ge t o t he t ar get wi t h pr obabi1it y 1. ter dan} 1963, 53-77. 

f act, ass uni ng a uni f ormdi stri but i on on the stri ng.si or ^, Trn . . . pit ii tt p 

the target grammr, it is possible to conpute the pM-. L G Valiant Atheo of t he Learnable. Proc. of 

i*i*i p i • i . i . pr i, c u the 1984 S IOC, 19 o 4, 436- 445 

abi 1 1 1 y ot not conver gi ng t o t he t ar get t or each ot t hes e ’ ’ 

pai rs . Note that t hi s pr obabi 1 i ty i s non- zero for the pai rs 

1 i sted. 


A 2 Riiarks 

1. W have pr ovi de d a c onpl e t e list of initial start- 
i ng gr amrar s f r omwhi ch s one t ar get is not 1 ear n- 
abl e (i . e. 1 earnabl e wi t h probabi 1 i ty 1) . W no- 
tice that there are three kinds of such problem 
starting states. Some states correspond to sinks 
i n t he Mir kov Structure with respect to some tar¬ 
get grammr. Here the learner gets stuck, never 
leaves it and correspondingly never converges to 
the target. Then there are states which are not 
sinks (OVS+V2 when the target is SVO-V2) but 

whi ch can onl y rrove t o s ome non- t ar get s i nk, and 
so never converge to the target. These tw kinds 
of pr obi ems tates (starredin our t abl e) have been 
listed by Q bson and Wxl er i n Fi g. 4 (pg. 27 of 
nanus c r i pt). Fi nal lythere are states whi chare not 
sinks, but which can with a non zero probability 
converge to some non-target sink. They can also 
wi t h a non- zero pr obabi 1 i t y conver ge to t he t ar get 
and i n t hi s respect are di s t i ngui s hed f r ompr obi em 
s t at es of t ype 2. 

2. W would like to observe that of the 56 possible 
initial gr amrar-t ar get grammr combinations pos¬ 
sible, 12 r e s ul t i n non- 1 e ar nabl e situations inthe 3- 
par ant ter systemi nve sti gat ed he re. Thi s is a f ai r 1 y 
hi gh de ns i t y of unf a vour abl e i ni t i al c onfigur at i ons . 

11 woul dbe interestingtosee howt hi s changes wi t h 
other lingual subsystem with a larger number of 
par amet er s . 

3. W al s o di d an anal ys i s of c onve rgence times unde r 
uni f or mdi s t r i but i on f or the each target grammr. 

W find t hat the results are similar to the results 
di s pi aye d i n t he paper for t he c as e whe n t he ^r ge t 




Fi gur e 1: Hie 8 par ant ter s e 11 i ngs i n t. he GWe xanpl e , s ho to as a Mr kov s t. r uc t. ure, wi t. h t. r ans i t. i on pr obabi 1 i 
onitted. (Wthout t r ansi t i on pr obabi 1 i t i es , this di agr amcor responds exact 1 y to that inGWs appendix, as me 
above . ) El r e c t e d ar r ows be tween circles ( s t. at e s ) r e pr e s e nt pos s i bl e nonz ero ( pos s i bl e 1 e ar ne r ) t. r ans i t. i ons 
gr annnr (in this case, nunber 5, set t i ng [ 0 1 0] ) , lies at dead center. Around it are the three settings tl 
f r omt he t. ar get by exact 1 y one bi nar y di gi t; s ur r oundi ng t. hos e ar e t. he 3 hypot. lies es t. to bi nar y di gi t. s away f r 
target; the third ring out contains the single hypothesis that differs fromthe target by 3 binary digits, 
the learner can either cycle or step in or out one ring (binary digit) at a tine, according to the single-stc 
hypothesis; but son* transitions are not possible because there is no data to drive the learner fromone st.a 
other under the TLA 
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Tabl el: Tr ans i t. i on lmt. r i x corres pondi ng t. o a par am* t r i z e d choi ce f or t. he di s t r i but i on on t. he t. ar ge t. s t. r i ngs 
c as e t. he t. ar ge t. ijs aid t. he di s t r i but ion is par am* t. r i z e d ac c or di ng t. o Se c t. i on 3. 2. 
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Figure 2: Convergence as function of number of examples. Hie horizontal axis denotes the, nunber of exanp 
r ecei ved and t, he ver t, i cal axi s r epr es ent s t, he pr oba.bi 1 i t y of conver gi ng t, o t, he t, ar get s t, at e . Hie dat a f r onit 
is assumed to be distributed uniforniy over degree-0 sentences. Hie solid line represents TLA convey gene e 
and t he dotted line is a r andomwal k 1 ear ni ng a.l gori thm( P5%) . Note that r andomwal k act ual 1 y converges fas ; 
t, han t, he TLA i n t, hi s £ as e . 







Table 2: Conplete list of problemstat.es, i.e. , all conbi nations of s t ar t i ng gr arnrnr and target grammr whi c 
i n non- 1 e ar na.bi lit y of t. he t. ar get. Tie i t. e ns nn.r ke d vri t. h an as teriskare t hos e listedin t. he or i gi na.l pa.pe r 
and Wxl er [ 1] . 
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